Configurable space-time performance trade-off in multidimensional data base systems

ABSTRACT

A multi-dimensional database includes data organized into hierarchical levels. For each level within a dimension, predefined thresholds are set to determine which intersections are precomputed and which are not. The general result is to precompute intersections which take relatively longer to compute at run time, and to not precompute intersections which can be calculated relatively quickly at run time. When multiple dimensions of data are provided, a step-by-step process uses the level threshold for each dimension to determine whether data is to be prestored at an intersection. An odd-even calculation insures that computations performed at run time are relatively efficient. The technique results in good run time response, while not requiring maximum storage or recalculation time for data updates. By adjusting the thresholds at the individual dimension level, response of the multi-dimensional database can be changed as desired.

CROSS-REFERENCE TO RELATED APPLICATIONS

This Application claims the benefit of U.S. Provisional Application No.60/151,964 filed Sep. 1, 1999. In addition, it contains subject matterin common with U.S. patent application Ser. No. 09/653,107, filed oneven date herewith, titled MULTIDIMENSIONAL DATA BASE SYSTEM WITHINTERMEDIATE LOCKABLE INTERSECTIONS, which is assigned to the assigneehereof and incorporated by reference hereinto.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to data storage in computersystems, and more specifically to a method for optimizing data storageand retrieval in database systems.

2. Description of the Prior Art

With the continued increase in computer system processing power and datastorage capabilities, increasingly large databases are being made widelyavailable. Simple database designs which work for small databases aregenerally unsuitable for very large databases, because access times tendto increase geometrically with the size of the database. Even withgreatly enhanced processing power, times to access very large databasescan become unmanageably long unless care is taken in design of thedatabase.

In many large database systems, it is common to perform many more readaccesses of the database than updates. For example, corporate currentand historical sales information may be widely available within thecompany, with constant read access to the database provided. However,updating the data stored within the database can occur relatively lessfrequency. In systems having this type of access pattern, it isimportant to minimize the read access time in order to enhance overallsystem performance.

One database design in common use at the present time is OLAP. Thisdatabase design technique can be run on several different types ofunderlying database engines, such as those commercially available fromHyperion, Oracle, and i2 Technologies. In addition to 1-dimenstionalOLAP database structures, of particular interest to the presentinvention are MOLAP (Multidimensional OLAP) systems.

In such systems, some types of data, often referred to as aggregateddata, can be pre-computed in order to improve read access times. Datathat is suitable to be aggregated is that wherein, in a multidimensionaldatabase, data for a higher level may be generated using the data forlower level members within the structure. For example, unit sales of aproduct can be aggregated in a database which defines multiple salesterritories within a sales region; the aggregate unit sales for theregion is the sum of the sales for the individual territories.

In order to improve read access times, such aggregatable data can beprecomputed, and stored in various locations within the database. In theabove example, the region sales data for a product can be stored withinthe region entry, so that it can be directly accessed at run time byreading each of the sales territory numbers once, and storing the sum inthe corresponding region entry. It is not necessary to access each ofthe sales territories when total sales for the region is desired.

MOLAP engines traditionally precompute all such aggregatable data. Thisleads to fast read access, but requires a relatively large amount ofstorage space. In general, the number of storage locations needed isequal to the number of elements in the cross-product of all of themembers for each dimension. For example, if a two-dimensional databasehas 1,000 members in a region dimension, and 1,000 members in a productsdimension, 1,000×1,000=10⁶ intersections that must be precomputed.Storage of aggregateable data into every one of these intersections isexpensive in terms of space.

In addition, whenever updates are performed to the database,precomputing all intersections affected by the update is relativelyexpensive in terms of processing time. Although the principal goal ofthe design is to minimize read access, doing so at the expense ofextremely long write access times for updates lowers overall systemperformance.

Performing run time calculations minimizes the time required to updatethe database, but can be very slow at read time because of the largenumber of read accesses which are needed to calculate the number.

In order to optimize system operation, a number of techniques have beenimplemented in MOLAP systems which provide that only selectedintersections are precomputed, while others are computed at run time.One approach is to store only a single data value if a parent and childintersection, or node, have a one-to-one relationship. Another approachis to have a user define, in advance, which nodes should be precomputed,and which should be computed on the fly at run time. This secondapproach provides the possibility of optimizing a database based uponthe particulars of an application, but requires a certain amount ofknowledge by an administrative user. It also becomes very difficult orimpossible to implement when multiple dimensions, each having a largenumber of members, are incorporated in the database. In addition, whenchanges are made to the database, a user is required to again make aseries of optimizing determinations.

It would be desirable to provide a multi-dimensional database systemwhich had the capability to provide a reasonable optimization of aspace-time trade off for a multi-dimensional database. It would befurther desirable for such system and method to be relatively easy todefine on the part of a user, and to adapt automatically to changes madeto the database.

SUMMARY OF THE INVENTION

In accordance with the present invention, a multi-dimensional databaseincludes data organized into hierarchical levels. For each level withina dimension, predefined thresholds are set to determine whichintersections are precomputed and which are not. The general result isto precompute intersections which take relatively longer to compute atrun time, and to not precompute intersections which can be calculatedrelatively quickly at run time. When multiple dimensions of data areprovided, a step-by-step process uses the level threshold for eachdimension to determine whether data is to be prestored at anintersection. An odd-even calculation insures that computationsperformed at run time are relatively efficient. The technique results ingood run time response, while not requiring maximum storage orrecalculation time for data updates. By adjusting the thresholds at theindividual dimension level, response of the multi-dimensional databasecan be changed as desired.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features believed characteristic of the invention are setforth in the appended claims. The invention itself however, as well as apreferred mode of use, further objects and advantages thereof, will bestbe understood by reference to the following detailed description of anillustrative embodiment when read in conjunction with the accompanyingdrawings, wherein:

FIG. 1 is a high level block diagram of a database system;

FIG. 2 is a representation of the levels within two database dimensions;

FIG. 3 is a tree diagram illustrating various features of a databasestructure; and

FIG. 4 is a table illustrating numbers of members at various levels ofthe dimensions defined in FIG. 2.

FIG. 5 is a table of threshold values to be used with the example ofFIG. 4; and

FIG. 6 is a pseudo-code definition of an algorithm which implements thepreferred method.

DESCRIPTION OF THE PREFERRED EMBODIMENT

The technique of the present invention can be used in different types ofmultidimensional database systems. It is particularly useful in a MOLAPengine running on any of a number of widely available database engines.

A high level diagram of a typical system on which the softwareimplementing the invention can be run is shown in FIG. 1. The system 10runs on system hardware 12. System hardware 12 is intended to berepresentative of both the physical hardware of the computer system, andvarious operating system and other low-level utilities. A databaseengine 14 runs on system hardware 12, and can be any engine such as iswidely available from numerous sources. A MOLAP engine 16 runs on thedatabase engine 14, and is accessed through a user interface 18. Exceptfor the improvements of the invention described below, each of thesepieces can be a generally available component as is well known in theprior art.

The following description includes definitions of some basic terms usedin the remainder of the description. Several examples, and the processesof the inventive method, are then described, followed by a detaileddescription of the operating properties of a suitable MOLAP system inwhich the invention is preferably practiced.

A dimension is a logical grouping of unique entities that are calledmembers of the dimension. Each of these members is uniquelyidentifiable. Each dimension is uniquely identified by a name.

A hierarchical dimensional is a dimension whose members are partitionedinto named levels. These levels have a partial order imposed on them.The partial order between levels is transitive: If A is related to B,and B is related to C, then A is related to C. Members of any tworelated levels have a one-to-many relationship defined between them.These levels are called higher (ancestor) and lower (descendant) levelsrespectively.

If level Y is the ancestor of X, then the relationship between thelevels can be represented as Y>X. A parent of a level X is the ancestorlevel Y where there does not exist a level Z such that Y>Z>X. If Y is aparent of X, X is a child of Y. This relation is a partial ordering, asa level can have both more than one parent, and more than one child. Forexample, level A can have two children B and C (A>B and A>C), and B andC can share a child level D. (B>D and C>D) In such a case, B and Cdefine parallel paths, and need not be related.

A dimension intersection for a set of dimensions is a set of memberswhere one member belongs to each dimension. An intersection can havemore than one member from a single dimension, but for purposes ofsimplicity, the description below will give examples in which eachintersection includes one member from each dimension involved in theintersection.

A data element is preferably treated as a (property, value) pair. Theproperty represents the name of a data measure which results in a valuegiven a member in every dimension that the data depends upon. A datameasure, generally, is a property that is dependent on a subset of thedefined dimensions. A data measure yields a scalar value at anintersection of its dependent dimensions.

An aggregatable data measure is defined with a lower bound level in eachof its dependent dimensions. The data measure can have data values for asubset of intersections of lower bound level members of every dependentdimension. Aggregateable data measures use an aggregation method togenerate data for an ancestor level member given data for all itschildren at one of its descendant levels. An example of an aggregationmethod is summation, i.e., summing of values in descendant levels to bestored in an ancestor level.

FIG. 2 gives examples of two dimensions which are used in the subsequentdescription. Geography dimension 20 includes four levels of its membersin a hierarchical order: National level 22, region level 24, districtlevel 26, and territory level 28. In a similar manner, product dimension30 includes four hierarchical levels: All products 32, category 34,brand 36, and SKU (Stock-Keeping Unit) 38.

As a simple example, members of each dimension can have astraightforward hierarchical relationship with members of its ancestorand descendant levels. FIG. 3 shows an example in which the region levelhas 4 members, region 1, region 2, region 3, and region 4. Each regionhas some number of child districts, which varies by region. In theexample of FIG. 3, region 1 has 10 districts, region 2 has 20, region 3has 5, and region 4 has 15 districts.

Only the five district members of region 3 are individually illustratedin FIG. 3. In addition, only the six territory level members of region3-district 3 are illustrated. The territories of the remaining districtsof region 3, and further remaining districts of the other regions, arenot shown in FIG. 3 for simplicity.

In FIG. 3, region 3 is a parent of each of its districts 1, 2, 3, 4, and5. It is an ancestor, though not a parent, of each of the territories1-6 within region 3.

In a similar manner, product dimension 30 will have a number ofcategories, brands, and SKUs. As an example, three product categoriescould be defined, having 5, 4, and 3 brands respectively. Each brand hasa number of SKUs. For the purposes of the following description, FIG. 4is a table illustrating the number of members at each level for both thegeography and product dimensions.

Referring to FIG. 4, within the geography dimension only region 3 isbroken out in detail, corresponding with the graph of FIG. 3. Theproduct dimension, having less members at intermediate levels, is shownin more detail. There are a total of 466 members in the geographydimension, and 761 members in the product dimension. Assumingintersections can have only one member from each dimension, this simpledatabase has 354,626 intersections.

For purposes of the remaining description, it will be assumed that anaggregateable data measure of interest is unit sales. Considering theproduct dimension, actual unit sales are provided at the SKU level. Unitsales may be aggregated, by summing, at the brand category and allproducts level.

Prior art techniques provide for a knowledgeable user to define whichmembers of the product dimension are aggregated at runtime, and whichare precalculated. According to the present invention, members which areprecalculated versus those which are calculated at runtime are definedby a threshold value assigned on a level basis. For example, if a memberat any given level has more children than the threshold defined for thatlevel, unit sales for all of its descendants will be aggregated andstored within that member. If a member has less children than thethreshold, the aggregated data may or may not be stored there asdescribed below.

As is normal in systems of this type, the database contains a dimensiontable for each dimension. Each entry in the dimension table correspondsto a member of that dimension, and contains various identification,control information, and data for that member. Actual database data isstored in the fact table which, potentially, includes an entry for eachintersection in the database. Actual data, such as unit sales andaggregated unit sales, is stored in the data table entries for thecorresponding intersections.

To implement the method of this invention, three pieces of informationare maintained for every member of a dimension. This information ismaintained in the dimension table. These items of information are: a“count” to keep track of the number of descendants required to computethe member, a “flag” which indicates if the member is “stored” or“not-stored”, and the child level containing the children to beaccessed. This latter item of information is used for hierarchies havingmultiple child levels, to identify which level to use.

For each member within a dimension, it is necessary to determine whetherthe aggregateable data, in this case unit sales, is to be stored ornot-stored for that member. This is done by comparing the count of eachmember to the threshold for that level. In general, if the count for amember is greater than or equal to the threshold level, the unit salesdata is aggregated and stored for that member. If the count is less thanthe threshold value, that member is marked as not-stored.

Several examples will be given with respect to the example numbers shownin FIG. 4. FIG. 5 gives a list of threshold values for each level ofboth the geography and product dimensions. These levels have been set inadvance by a user, preferably one who has some idea how settingthreshold levels will impact the performance of the database. Thesethreshold levels can be changed at any time by an appropriateadministrative user.

As shown in FIG. 5, the territory and SKU levels within the geographyand product dimensions, respectively, do not have a threshold value.This is because these levels do not have any child levels, andtherefore, all sales data must be stored at these levels. Referring tothe product dimension, the threshold for the brand level has been set at50. This means that any brand having 50 or more SKUs will beprecalculated and stored within the brand member. Those having less willnot be precalculated unless certain criteria are met as described below.As shown in FIG. 4, none of the three brands in category 3 will beprecalculated. Brands 2, 3, and 4 within category 2 will beprecalculated, and brands 3 and 4 within category 1 will beprecalculated. This saves significant time when reading the database,because these brands, each having a large number of SKUs, contain theprecalculated aggregate total of all of their children. This savessignificant computation at runtime.

Looking at the categories within the product dimension, category 1 willcontain a precomputed unit sales value, because it contains five brands.This will also be the case with category 2. Category 3, at leastinitially, will not need to be precomputed. In fact, this would remainthe case assuming that each of the brands 1-3 within category 3 had apre-computed aggregate value for unit sales.

However, in the example shown in FIG. 4, the brand members 1-3 withincategory 3 are themselves not aggregated. Thus, in order to calculateunit sales for category 3, it will be necessary to traverse the treedown to the SKUs for each of brands 1-3 within category 3. This is atotal of 40 SKUs, plus 3 brands, giving 43 nodes which must be traversedto compute unit sales for category 3. In this case, sales for category 3would be precomputed because 43 is greater than the threshold level of4. If the threshold level for categories had been set at 50, noprecomputation would take place for either category 3 or brands 1, 2,and 3.

Further, if the category threshold had been set at 50, category 2 wouldnot be precalculated. This is because each of brands 2, 3, and 4 incategory 2 was precalculated, giving each of these brands a count of 1.Brand 1 on category 2 is not precalculated, and has a count of 10.Therefore, the total count for category 2 would be 13, less than athreshold of 50.

If the category threshold were 50, category 1 would still bepre-computed, because only brands 3 and 4 within category 1 areprecomputed. Brands 1, 2 and 5 are not precomputed, giving category 1 acount of 10+20+1+1+35=67. Any category threshold less than 67 causedcategory 1 to be pre-computed.

FIG. 6 is a psuedo code definition of the algorithm used to determinewhether the flag for any particular member is marked as stored ornot-stored. Initially, if the level does not have any child levels, allmembers have the flag marked as stored. If the level does have any childlevels, each member of that level must be considered separately. Anested series of loops is used, as shown, to traverse the nodes of thevarious levels and determine whether the flag for each member is set asstored or not-stored.

For each member, for every child level of that member, the member'scount is initially set to 0 and then incremented for every member of thechild level. If the child's flag is stored, one is added to the currentmember's count. If the child's flag is not-stored, the child's count isadded to the current member's count. The minimum computed count for allof the different child levels (and there will be only one in a simplehierarchical example such as shown in FIG. 4) is stored in the countentry for that member. If that count is greater than a threshold valuefor that level, the flag is marked as stored, otherwise it is marked asnot stored. The level which yields this minimum count is stored as thechild value for that level.

Operation of this algorithm can be seen with respect to FIGS. 3 and 4.Assume that the level currently under consideration is the region levelof the geography dimension. For member region 3 of this level, there arefive children. Initially, the count of Region 3 is set to zero. Then foreach of members 1 through 5, a determination is made as to whether theflag for that child is stored. Given the thresholds shown in FIG. 5 forthe geography dimension, the stored flags are marked for children 1, 3and 4, because these children have four or more territories in thethreshold as four. For each of these children out of Region 3, one isadded to account for Region 3.

Of the remaining two children of Region3, Districts 2 and 5, the flag ismarked as not-stored. For these two children, the count of each child isadded to the count of Region 3. For District 2, the count is 2, and forDistrict 5 the count is 1. This give a total count for Region 3 of 6.

Because, through the algorithm of Figure of 6, the minimum count of 6 isstored, and is compared to the region threshold of 10. Because 6 is lessthan 10 the flag is marked not stored. Also, the level which yields thiscount is district, so a pointer to the district level is stored as thechild pointer for Region 3.

This looping algorithm is applied to every member of each dimension. Asdescribed further below, it is not applied to intersections ofdimensions, but only to the members of the dimensions themselves.

If the data scheme is one dimensional, for example only the geographydivision existed within the database, the storage scheme for precomputeddata within the database is simple. Data is not stored at the membersmarked not-stored, and it is stored at members whose flag is marked asstore. When accessing unit sales at runtime, if a member with anot-stored flag is encountered, the members of the child level are usedto compute the sales numbers for the member of interest. If any of thechildren are also marked not-stored, its children are checked in arecursive arrangement.

Important space savings of the present invention are primarily achieved,however, when this approach is extended to multiple dimensions. It isnot necessary to keep a stored/not-stored flag for every intersectionwithin the database; it is only necessary to keep this information foronce for each member of each dimension.

The algorithm for the multi-dimensional case is described in thefollowing paragraphs. Consider a data measure M dependent on ndimensions D1, D2. Dn . . . M is a subset of the cross product of thedimensions D1, . . . , Dn. Each member of M is a n-tuple where eachcomponent of the tuple belongs to one of the dimensions. As describedabove each such component is marked stored or not-stored by the methoddescribed above.

There are two cases to be considered:

-   -   i) The number of components of an intersection marked not-stored        is even        -   a) The number of components marked not-stored is zero        -    The intersection is stored and is directly accessed        -   b) All the components marked not-stored have a count of 1        -    The intersection is not stored in the data measure.        -    When the intersection is accessed, the system navigates            down all the dimensions containing the components marked            not-stored. The child level of each component described            above is used for the navigation. Since the count is 1 for            all components, only one child marked stored will be found            in each dimension. All components are replaced with members            of the corresponding dimensions that are found (marked            stored). The intersection thus formed is accessed (case a            above).        -   c) At least one of the components marked not-stored have a            count greater than 1        -    The intersection is stored and is directly accessed    -   ii) The number of components of an intersection marked        not-stored is odd    -    The intersection is not stored in the data measure.    -    When the intersection is accessed, the dimension containing the        component marked not-stored with the minimum count is selected.        The system navigates down the dimension making use of the child        level information and collects all the members marked stored.        The number of such members equals count. The component of the        original intersection is replaced by each of the members        collected forming count intersections. Each intersection thus        formed has one less component marked not-stored making the        number of not-stored components even. These intersections can be        accessed using the methods described in case i above using which        the original intersection can be computed.

In a system with n dimensions D1, D2, . . . Dn with number of membersM1, M2, . . . Mn respectively, this scheme uses additional spaceproportional to M1+M2+ . . . +Mn but saves space proportional to M1 * M2. . . Mn (the unstored intersections). This scheme guarantees that thenumber of intersections accessed to calculate the data at anintersection does not exceed the maximum of the thresholds associatedwith the levels to which the components of the intersection belongs.

In the example used in this description, with two dimensions, there arefour possibilities for an intersection, because there are two membersfor the intersection, with each member having two states for its flag.The possibilities are for both members' flag to indicate stored, bothindicating not-stored, and one flag indicating stored with the otherindicating not-stored (two instances).

As described by the multi-dimensional algorithm, if one member is markedstored while the other is marked not-stored, the data measure (unitsales) is not stored at that intersection. If both members are markedstored, the number of not-stored flags is 0, the unit sales data measureis aggregated and stored at that intersection. If both components aremarked not-stored, whether the data measure is stored at thatintersection will depend upon whether or not either of the componentmembers have a count greater than one. If at least one of them does, thedata is aggregated and stored at that intersection. If both have a countof one, the data is not stored at that intersection.

As an example, consider the two dimensional example of FIGS. 4 and 5.One intersection is that of category 3 with region 4. Unit sales forthis intersection represents unit sales of category 3 products in region4. To determine whether sales data is to be aggregated and stored atthis intersection, (i.e., an entry for this intersection is made in thedata table), the flags for region 4 and category 3 are examined. Bothmembers are marked stored, meaning there are zero not-stored flags.Therefore, unit sales data is stored at this intersection. It is notnecessary to provide a flag for this intersection because, its status isdeterminable from the flags of its two members.

Now, assume that the threshold for the category level was changed to 50,as described previously. In this case, category 3 would be marked notstored. The intersection of category 3 and region 4 now has an oddnumber of not-stored flags (1 flag marked not stored), so no unit salesdata is stored at that intersection. When this number is needed, aruntime calculation is performed. As described above, this determinationis made based only on the flags of the members included in theintersection. Thus, no flags are needed for the numerous possibleintersections, and no data will be stored (i.e., entries made in thedata table) for many of such intersections, thereby saving storagespace.

The alternating effect caused by using odd or even numbers of not-storedflags to control whether data is stored at the intersection helps ensurethat no runtime searches take too long. If data is not stored at anintersection, only a relatively small number of steps will need to befollowed in order to meet the aggregation (e.g., summation) of the dataneeded to define that intersection.

Although the algorithm described above of setting thresholds andtraversing nodes of a database tree can be applied to different databasedesigns, the preferred embodiment is more completely defined to haveparticular characteristics. The characteristics of principal importanceare now described as a set of properties for elements of the database,with some examples given to illustrate them.

Dimension

Description

A dimension is a logical grouping of unique entities that are calledmembers of the dimension. Every member of a dimension has a code and adescription associated with it. Dimensions have a unique name and anynumber of unique aliases associated which can be used to refer to it.

Example

Geography dimension has members Eastern Region, Western Region, New YorkTerritory and so on.

Dimension Intersection

Description

A dimension intersection for a set of dimensions is a set of memberswhere one member belongs to each dimension. An intersection can havemore than one member from a single dimension. Each of these dimensionsneeds to be identified with a unique alias name.

Example

-   [Brand Cookies, Region Eastern, January 95] is one intersection of    the [Product, Geography, Time] dimension set.    Data    Description

A data element can be viewed as a (property, value) pair. Propertyrepresents the name of a data measure which results in a value given amember in every dimension that the data depends on. The data may notrequire identification of members in all defined dimensions. Dimensionson which the data measure depends are called dependent dimensions andthose on which it does not depend are called independent dimensions.Data can depend on the same dimension more than once in which casemultiple positions of the dimension need to be identified to obtain thedata values.

Dimensions can be classified as sparse or dense depending uponavailability of data elements. A dense dimension has meaningful data fora subset of its members irrespective of the members of the otherdimensions. A sparse dimension is one where the availability of a dataelement depends on members of the other dimensions. Sparseness ordensity of a dimension can vary by data element.

Example

Unit Sales, Price are all data measures. (Unit Sales, 1000)—identifiesthe value for Unit Sales at Eastern Region, Cookies Brand and Q1-1997.Unit Sales depends on Geography, Product and Time. Whereas Price isdependent on Product and Time and is independent of Geography. Time is adense dimension for the measure Unit Sales and sparse dimension forPrice.

Hierarchical Dimension

Description

Members of a hierarchical dimension are partitioned into named levels.These levels have a partial order imposed on them. The partial orderbetween levels is transitive: If A is related to B, and B is related toC, then A is related to C. Members of any two related levels have aone-to-many relationship defined between them. These levels are calledhigher (ancestor) and lower (descendant) levels respectively.

-   -   If level Y is the ancestor of X, then Y>X (read as, “Y is        ancestor of X”). A parent of a level X is the ancestor level Y        where no level Z exists such that Y>Z>X.

In case of sequenced dimensions, a sequence is imposed within themembers of a level. If a sequence is imposed on multiple levels of adimension, the sequence needs to be consistent across the levels. Inother words, if sequenced level X is an ancestor of sequenced level Yand member x[1] of level X is before member x[2], then any y[i] which isa child of x[1] is before any y[j] which is a child of x[2].

Every level in a dimension can have user specified thresholds which areused to determine the storage strategy of aggregatable measures. Thesystem keeps track of the minimum number of child members that need tobe accessed to compute data at every member of a dimension. Every memberis flagged “to be stored” if the count is more than the upper thresholdand “not to be stored” if the count is less than the lower threshold.The flag is changed to “to be stored” when the count goes above theupper threshold and to “not to be stored” when the count goes below thelower threshold.

Example

Quarter and Month are names of levels of the Time Dimension. Region andDistrict are names of levels of the Geography Dimension.

Product Dimension can have All Products, Brand, Size and SKU as levelswith the order

All Products>Brand, Brand>SKU, All Products>Size, Size>SKU. All Productsis higher than Brand and Brand is an ancestor of SKU. Brand and Size arenot related to each other. Members of Size could be Big, Medium andSmall.

The Cookies Brand is a member of level Brand and is the parent ofChocolate Chip, Oatmeal Raisin and Macadamia Nut SKUs.

The month is a sequenced level of dimension Time and may have membersJanuary 95 . . . December 96 with a system recognized sequence. Quarteris an ancestor of Month and is sequenced. If Qtr1 is before Qtr2 thenall months in Qtr1 are before all the months in Qtr2.

Aggregation Path

Description

Given a member x of a level X, there exists a member y related to x inevery ancestor level Y of X. This is called the set of ancestor membersof x. The members in the ancestor set of x may or may not be related toeach other.

Aggregation path of member x, is defined as a subset of ancestor membersof x that have a total order imposed on them. In other words, every pairof members in the aggregation path are related to each other.

Example

In the hierarchical dimension example, one aggregation path forChocolate Chip cookies contains Cookies Brand and All Products (the onlymember of the All Products level).

Multiple Hierarchies or Multiple Aggregation Paths

Description

Each member can have more than one aggregation path to facilitategrouping by different attributes which are members of unrelated levels.

Thus a member can have multiple parents resulting in multiplehierarchies in a dimension.

Example

SKU can be grouped by Brand or Size. The two aggregation paths for amember of the SKU level are Cookies Brand-AllProducts andBig-AllProducts. The parents of Chocolate Chip (SKU) could be Cookies(Brand) and Big (Size). The two hierarchies of the Product dimension areAllProducts-Brand-SKU and AllProducts-Size-SKU.

Split Dimension

Description

A dimension can be split in such a way that different levels can beindependently selected to identify a data element.

Splitting is done by selecting a level X as one of the dimensions. Theuniqueness of this dimension will remain the same. All the levels of theoriginal dimension which have X as one of their ancestors form the newdimension and the relations among these levels remain unchanged. Theuniqueness of the newly formed levels is obtained by subtracting theuniqueness of X from their original uniqueness. Dimensions can be splitat run-time. Split dimensions can be split further.

Example

January 1995 could be defined as an unique member of the level month ofTime Dimension. Another way to define it is as member January of levelMonth and a descendent of member 1995 of level Year. If the lattermethod is used to define the Month and Year levels, then the Timedimension can be split between Year and Month and viewed in aspreadsheet with Month going down (rows) and Year going across(columns).

Combined Dimension

Description

A Combined dimension is a combination of two or more dimension. Allmembers of the Cartesian product of the dimensions to be combined aremembers of the combined dimension. Two levels of a combined dimensionare related if one component of one of the levels is related to thecorresponding component of the second level provided all the othercomponents match. A combined dimension can be defined at run time and isuniquely identified by a name.

Example

Consider two dimensions defined by a partially ordered set of levels.Let {A1, A2, A3, A4} be the levels of dimension A and {B1, B2, B3, B4}be the levels of dimension B. Then the set of levels {A1B1, A1B2, A1B3,A1B4, A2B1, A2B2, A2B3, A2B4, A3B1, A3B2, A3B3, A3B4, A4B1, A4B2, A4B3,A4B4} is the new formed combined dimension.

If A1>A2 then A1X>A2X for all X belonging to dimension B.

Uniqueness

Description

Every member of a level is identified using a code. The member code maynot be unique by itself but requires to be unique within the dimensionwhen qualified by all its ancestor member codes. The uniqueness of alevel is the set of all its ancestor levels.

Example

In the split dimension example, the code assigned to January could be01, February—02, Year 1995—95, Year 1996—96, and so on. To uniquelyidentify Month January of Year 1995, the combined code of Month and Yearwhich is 9501 is needed. The uniqueness of level Month contains theMonth and Year

Adding Dimension Members

Description

Dimension members can be loaded at any level by specifying all thecomponents of the uniqueness of the level. The system will add theancestor members to all ancestor levels if required. In other words, allmembers of a dimension need to have all their ancestors defined but neednot have their descendants defined. Adding dimension members is amulti-user operation and is available to every user with the requiredaccess.

Consolidated Level

Description

A consolidation of a level X within an ancestor level Y is defined asthe set of unique members that result by adding level X to theuniqueness of Y and applying the resulting set to level X. The membersof the new level Z thus formed are assigned new descriptions and thelevel has the following relations:Y>ZZ>X

A consolidation of a level X can also be defined as the set of uniquemembers within X. The uniqueness of the new level Z will contain itselfonly. In this case, the new level Z has the following relation:Z>X

Consolidated levels are definable at run-time. Consolidated levels canbe used in all operations where a pre-defined level can be used.

In case of aggregatable data measures, the defined aggregation method isused to compute data at the members of the new level. In case ofnon-aggregatable measures, the default access method defined for themeasure is used to generate data.

Example

If all the corresponding months of every year are assigned the same code(code of January is 01 in years 1995, 1996 and 1997 and so on) then aconsolidated level called “Consolidated Month” could be defined atrun-time for level Month within the AllYears level to compare data bymonths across all years. The data for January will be an aggregation ofJanuary 1995, 1996, 1997 and so on.

The uniqueness of the Month level is {AllYears, Year, Month}. Theuniqueness of Consolidated Month is formed by adding Month to theuniqueness of AllYears: {AllYears, Month}

Custom Level

Description

A custom level of a level X within an ancestor Y is defined by mergingsub-groups of members of X that are related to a single member of Y.This could be performed for multiple members of Y. The members of newlevel Z thus formed are assigned new codes and descriptions and thelevel has the following relations:Y>ZZ>X

Custom levels can also be defined on level X without being bound by anancestor level. In this case the new level defined Z has the followingrelation:Z>X

Custom levels are definable at run-time. Custom levels can be used inall operations where a pre-defined level can be used.

In case of aggregatable data measures, the defined aggregation method isused to compute data at the members of the new level. In case ofnon-aggregatable measures, the default access method defined for themeasure is used to generate data.

Example

A new grouping attribute called Packaging could be added to the Productdimension at run-time. To do this a custom level called Packaging couldbe defined from SKU within the ancestor level Size by combining SKUs ofsimilar packaging into one member. Chocolate Chip and Macadamia Nutcould be combined into one member of the Packaging level and OatmealRaisin could form another Packaging member.

Custom Group

Description

Custom group is a subset of members of dimension. Custom groups can bedefined by arbitrary selection, tree, or using expressions on datameasures dependent on this dimension alone.

Tree is defined by a position p (level member), a level S related(ancestor or descendent) to the level of the position x where the treestarts and a descendent level E of the start level where the tree ends.The level S can be above or below the level of the p. The custom groupcontains all members that are related to p belonging to the levels Xsuch that X is the same as or a descendent of level S and is the same asor a an ancestor of level E.

Set of trees or Range of trees are example of a Custom Group.

Custom Scope

Description

Scope is a set of non-overlapping members of a dimension. Two members ofa dimension are non-overlapping if they do not have any common children(other members of the same dimension belonging to the descendentlevels).

Set of members of a level and Range of members of a sequenced level areexamples of scope.

Data Measure

Description

A data measure is a property that is dependent on a subset of thedefined dimensions. The subset of dimensions on which a data measure isdependent is called its Dimensionality. Dimension aliases can be used ifa data measure is dependent on the same dimension more than once. Inother words, specification of more than one position of such dimensionsis needed to access the data stored in the measure.

Data measure yields a scalar value at an intersection of its dependentdimensions. The scalar value is undefined if the intersection is notstored subject to access methods discussed below. The intersection of asubset of dependent dimensions results in a multi-dimensional array ofvalues where the axes represent the dimensions that are not part of theintersection.

There are two kinds of data measures: Aggregatable and Non-aggregatable.

Another categorization for data measures storing numeric values isSigned and Unsigned where Unsigned data measures can not accept negativenumbers. In case of signed data measures negative numbers can be lockedand locking an intersection does not prevent negative number storage inthe measure.

Example

Unit Sales, Price, Actual Dollars, Forecast Units are all data measures.

(Unit Sales, 1000)—identifies the value for Unit Sales at theintersection [Eastern Region, Cookies Brand, Q1-1997].

The intersection [Eastern Region, Cookies Brand] results in aone-dimensional array of scalar values for all the time dimensionmembers.

Data Type

Description

The type of the scalar value to be stored at an intersection for a givendata measure could be one of Numeric (specified as maximum number ofdigits stored and implied number of decimal places), Integer (maximumnumber of digits and implied number of trailing zeros not to be stored),Boolean, String & BLOB (Binary Large Object). A Reference data type willalso be supported to store references to application objects.

Example

Price can be defined as Numeric with a maximum of 5 digits (decimalpoint not included) and 2 decimal places. Price is stored in cents anddivided by 100 while reading.

Revenue can be Integer with 5 digits stored and 3 trailing zeros notstored. In other words, Revenue is stored in thousands and amultiplication factor 1000 is applied while reading.

Unit Sales can be defined as Integer with a maximum of 10 digits.

Comments can be defined as BLOBs where any binary object (pictures,documents etc.) can be stored at dimension intersections.

Aggregatable Data Measure

Description

An aggregatable data measure is defined with an anchor (lower bound)level in each of its dependent dimensions. The data measure can havedata values for a subset of intersections of lower bound level membersof every dependent dimension. It also has values defined for allintersections in all the aggregation paths of the lower bound levelmembers.

Optional levels called the “aggregation termination” levels can bespecified for every dependent dimension. These levels should be same asor ancestors of the lower bound level. Only those intersections alongthe aggregation paths that contain members of the aggregationtermination (upper bound) levels are defined. The intersectionscontaining members of ancestor levels of the aggregation terminationlevels are not defined.

Aggregatable data measures use an aggregation method to generate datafor an ancestor level member given data for all its children at one ofits descendent levels. Examples of aggregation methods are sum, weightedaverage, min, max, or, and etc. In case of weighted average, asummarizable data measure is needed to provide weights.

Aggregatable data measures can only have data of types Numeric, Integeror Boolean and the aggregation integrity is maintained by the system.

An incremental aggregation method is one which can be used to computethe value at the parent knowing the old value of the parent, old valueof the child, and the new value of the child when one of the children ischanged. This lets us compute the parent value by accessing only thechanged children. Sum and weighted average are examples of incrementalaggregation methods.

If an odd number of dimension components of an intersection x, areflagged “not to be stored” (see section Hierarchical Dimension), theintersection is not stored. Otherwise, the intersection is stored. If anintersection within the defined bounds of a data measure is not stored,access routines will compute the value at run time from the childrenpositions.

When a “to be stored” intersection becomes “not to be stored”, theintersection is immediately flagged deleted and is not accessed. On theother hand, when a “to be stored” intersection becomes “not to bestored”, it is added when the intersection is needed.

Example

Unit Sales could be defined as a summarizable data measure dependent on.Product and Geography dimensions with SKU and Region as the anchorlevels. If the Unit Sales data values for [Chocolate Chip, EasternRegion], [Oatmeal Raisin, Eastern Region] and [Macadamia Nut, EasternRegion] intersections are 100,200 and 300 respectively, then systemshould ensure that the value at [Cookies Brand, Eastern Region]intersection is 600.

If the aggregation termination level is not specified, thenintersections corresponding to the level combinations [Size, Region] and[AllProducts, Region] will be generated.

If the aggregation termination level is specified as Brand in theProduct dimension then the combinations of [Size, AllProducts] will notbe generated.

Allocatable Data Measure

Description

An allocatable data measure is an incrementally aggregatable datameasure which has a dis-aggregation method defined. Dis-aggregation isthe process of arriving at unique children values given a parent valueand a profile (set of basis values).

Allocatable data measures can only have data of types Numeric, Integerand support aggregation methods: sum and weighted average.

Updating an Allocatable Data Measure

Description

When a data element is modified at an intersection, the change needs tobe distributed down to the anchor level intersections within its scopebased on a pre-defined criterion. The change also needs to be propagatedto all ancestor member intersections of all defined aggregation pathsfor the data measure.

Update of anchor level intersections by multiple users/threads should besequential, such that no two intersections common to any two updatescopes are hit in different sequence. If intersection A and intersectionB are in the update scopes of thread T1 and thread T2 and if T1 changesintersection A first, T1 should change intersection B also first andvice versa. This will result in one of the updates completely overridingthe other and the end result will be consistent with at least one of theupdates when the scopes overlap.

An online re-synchronization operation is provided since incrementalupdate can only preserve “aggregation integrity” if it is apre-condition.

An update operation can have the following parameters for each dependentdimension: A scope, an input level (below the level of the scope) atwhich the external input is obtained. A level combination sequence tofollow (sequence needs to monotonically progress towards the lower boundlevel combination) during the allocation can be defined if anapplication provided basis needs to be used for allocation along thepath. A default data base specified basis is used beyond the specifiedpath to reach the lower bound combination.

Example

In the aggregatable data measure example, if the Unit Sales data elementcorresponding to the [Cookies Brand, Eastern Region] intersection ischanged from 600 to 900 the new values for [[Chocolate Chip, EasternRegion], [Oatmeal Raisin, Eastern Region] and [Macadamia Nut, EasternRegion] intersections should be 150, 300 and 450. The pre-definedcriterion in this example is “preserve the existing proportions”.

The update to the [SKU, Region] intersections also causes the dataelements corresponding to the [Size, Region] intersections to be updatedbecause the members of the Size level fall in the relevant aggregationpath for SKU.

If multiple users attempt to change the values for any of theintersections at [SKU, Region], the values at the [Brand, Region]intersections should be the sum of the [SKU, Region] intersections afterthe operations are complete and no other updates are being performed onthe data measure.

Non-Allocatable Data Measure

Description

A Non-allocatable data measure is an aggregatable data measure with nodis-aggregation allowed. In other words, all updates are performed atthe lower bound level combination and aggregated to higher levels.Aggregation is performed without assuming any pre-existing aggregationintegrity and requires accessing all children of each node affected byan update.

Non-allocatable data measures can only have data of types Numeric,Integer and Boolean and support the aggregation methods: max, min, and,or etc. User defined aggregation can defined for pre-defined types.

User defined data types with user specified operations are supported.The operation should include a pre-defined aggregation method which isused to perform the aggregations along the dimensions. A user definedoperation can be performed on a custom group of intersections.

Updating a Non-allocatable Data Measure

Description

A Non-allocatable data measure can only be modified at the lower boundlevel combination directly. The change is then propagated upwards usingthe aggregation method to all the parent nodes.

Update of all intersections by multiple users/threads should besequential, such that no two intersections common to any two updatescopes are hit in different sequence. If intersection A and intersectionB are in the update scopes of thread T1 and thread T2 and if T1 changesintersection A first, T1 should change intersection B also first andvice versa. This will result in one of the updates completely overridingthe other and the end result will be consistent with at least one of theupdates when the scopes overlap.

An update operation needs to have just a scope specified for eachdependent dimension. The external input is obtained at the lowest leveland the data measure is updated.

Non-Aggregatable Data Measure

Description

A non-aggregatable data measure can have data at a subset ofintersections of its dependent dimensions. Data at each intersection isindependent of the data at other intersections. In other words, there isno “aggregation integrity” that needs to be maintained fornon-aggregatable data measures. Updating a value for thenon-aggregatable item updates the data element at the currentintersection.

Users can add or delete any intersection subject to access restrictions.If an intersection does not exist, a higher level intersection isaccessed. This is repeated in a user specified order until anintersection is found. On the other hand, update operation is performedonly if the intersection exists.

Non-aggregatable data measures can have any of the data types listed inthe data type section. User defined data types with user specifiedoperations are supported. A user defined operation can be performed on acustom group of intersections.

Example

Price, Seasonally are non-aggregatable data measures.

Seasonality can be defined as a non-aggregatable data measure at [Brand,Region], [SKU Region] and [SKU Territory] level intersections.

Price can be defined as a non-aggregatable data measure at [SKU Region]level intersections. The access order could be defined as just Region.This will retrieve valid data from the Region level for allintersections at all descendent levels of Region. Data is not availableat ancestor levels of Region and SKU.

Updating a Non-Aggregatable Data Measure

Description

All positions in a non-aggregatable data measure are independentlyupdate-able. A custom group can be specified for every dependentdimension. All existing intersections of the members of custom group areindependently updated using application supplied data.

Data Cube

Description

A data cube is a set of similar data measures (all allocatable, allnon-allocatable or all non-aggregatable) with the same dimensionability.A subset of the dimensions of the cube can be designated as dense.

A subset of all possible combinations of the rest of the dimensions isidentified for the cube. The storage or non-storage of an intersectionis defined by the types of data measures stored in the cube.

If the cube contains a non-aggregatable data measures, a custom group ofmembers of each dense dimension is identified and all the intersectionsof these custom groups are stored in the cube.

In case of a non-allocatable data measures, a scope of members at thelower bound level of each data measure is identified for each densedimension and all intersections of these scopes are stored in the cube.In other words, only the lowest level data is stored and higher levelsare generated at run time.

In case of allocatable data measures, a scope of members at one of thelevels at which the data measure is defined is identified for each densedimension and all intersections of these scopes are stored in the cube.All the levels at which the data measure is defined, and are reachablefrom the stored level either through aggregation or dis-aggregation, aregenerated at run time.

What-If Update

Description

Any update operation can be saved as a what-if update without commit.The what-if update when active acts as a filter applied to the actualdata that is retrieved. The data resulting will appear as if the updatewas committed to the database.

The what-if update can be committed if required.

This will only be implemented if the performance of a what-if update isbetter than a regular update (commit).

Example

In the updating aggregatable data measure example if a what-if update of900 is saved to the [Cookies Brand, Eastern Region] intersection, theactual data in the database corresponding to this intersection willstill be 600 whereas when the what-if is applied it will appear as ifthe data is 900. The data at the [Chocolate Chip, Eastern Region]intersection will appear to be 150 even though it really is 100.

Locking a Dimension Intersection

Description

Locking a dimension intersection of an allocatable data measure shouldkeep the value stored at the intersection unchanged while distributingthe data from a higher level or propagating the change from a lowerlevel.

Locking a dimension intersection of a non-allocatable data measure isequivalent to locking all the lower bound intersections within the scopeof the locked intersection.

Locking/unlocking an intersection of a non-aggregatable data measure istreated as locking a single intersection. In other words, updates willskip the locked intersection.

Example

In the aggregatable data measure example, it the Unit Sales data elementcorresponding to the [Chocolate Chip, Eastern Region] intersection islocked at 100 and if data element at [Cookies Brand, Eastern Region]intersection is changed from 600 to 900 the new values for [OatmealRaisin, Eastern Region] and [Macadamia Nut, Eastern Region]intersections should be 320 and 480. The existing proportions of theunlocked intersections are preserved while keeping the value at thelocked intersection unchanged.

Mapping Dimension Members

Description

Members of two dimensions, target and source, can be mapped to eachother. Target and source can be defined using aliases and can refer tothe same defined dimension. Mapping defines the source members used tocompute a target member and is used in Relations and Virtual DataMeasures.

Mapping can be defined using a relative definition: ancestor, children,siblings etc. Mapping can also be defined by enumerating the sourcetarget pairs.

Relation on a Data Measure

Description

A relation can be defined on a “target” data measure by associating anexpression with it. The expression can contain one or more “source” datameasures. Each data measure involved in the relation (including thetarget) has a dimensionality defined using aliases. The dimensionalityof the target and source measures need not be the same. In other words,different aliases can be used for different measures to specifydimensionality. For each source dimension that do not match a targetdimension (matching is by name and not by the defined dimension itrefers to), the members of one of the target dimensions will need to bemapped to the members of the source dimension.

The relation expression can contain constants, source data measures,unary, binary and aggregation operators which are pre-defined.Aggregation operators are needed when a single target dimension memberis mapped to multiple source dimension members. The relation definitionalso specifies whether the relation triggering is manual or automatic.

In case of automatic triggering, the relation triggering is transitive.In other words, if A is a source of a relation on B and B is a source ofa relation on C and both are triggered automatically, both B and C areupdated when A is modified. Cycles are detected by the system when therelations to be executed are being collected: if the target of arelation to be triggered is involve (either as source or target) in therelations already in the list of relations to be executed, then thisrelation is not added to the list.

Relation updates are non-incremental and all source positions requiredto compute a target position are accessed. Since any arbitraryexpression can be used and reverse relations cannot be verified by thesystem, relation integrity is not guaranteed by the system.

If the target is an allocatable data measure, a level is identified foreach dimension of the target at which the relation is computed. In caseof non-allocatable data measures, the anchor level is used forcomputation. In case of non-aggregatable measures, all intersections areindependently computed.

Example

The Units data measure could have a relation defined on it with theexpressions (GrossDollars/Price; GrossTons/Weight). Every update to theGrossDollars, Price, GrossTons or Weight data measures will re-calculatethe Units data measure.

Virtual Data Measures

Description

Virtual data measure is a data measure which has no storage of its own,and its value at an intersection is computed at run time. It isnon-modifiable. Each virtual measure is defined by an expression whichis used to compute its value. The expression can contain constants,stored data measures, unary, binary and aggregation operators which arepre-defined. Expressions cannot contain other virtual measures.

The dimensionality of the virtual and stored measures need not be thesame. Aliases can be used to refer to the same internal dimensions andeach alias is treated as a different dimension for access. For eachdimension of the stored data measure (source dimension) that do notmatch a dimension of the virtual measure (target dimension), the membersof one of the target dimensions will need to be mapped to the members ofthe source dimension. Aggregation operators are used when a singletarget dimension member is mapped to multiple source dimension members.

Dimension Member Realignment

Description

Members of hierarchical dimensions can be realigned in one of thefollowing two ways:

-   1. Moving: A subset of parents of a member of a dimension can be    changed. The whole subtree under the member is moved from one set of    parents to another set. The member specification can include a    subset of its uniqueness thus moving multiple members possibly    merging them into one target member. The target member specification    can include a different subset of its uniqueness. The data    corresponding to all the intersections that contain the moved member    is moved.-   2. Deleting: An existing member can be deleted. The whole subtree    under the member is deleted. The member specification can include a    subset of its uniqueness thus deleting multiple members. The data    corresponding to all the intersections that contain the deleted    member is voided.

An option to reject the operation is provided if the dimension memberspecification identifies multiple members of the dimension. All datameasures depending on the realigned dimension are impacted.

The realignment operation should preserve the aggregation integrity ofall aggregatable data measures. When deleting existing members, thesystem should ensure that there are no existing intersections foraggregatable data measures that do not have corresponding anchor-levelintersections.

Realignment may require exclusive access to the data base and may beallowed only by the administrative client.

Example

Consider the geography dimension with levels: National, District, DC,Account. The relations defined are: National>District, National>DC,DC>Account and District>Account. The uniqueness of Account is {DC,District, National}.

If the Account realignment specification is Move Account=Kmart fromDC=New York and District=GreatLakes to District=TriState. All Kmartaccounts belonging to NewYork(DC) and GreatLakes(District) are moved toNew York(DC) and TriState(District).

If the Account realignment specification is Move Account=Kmart fromDC=New York to DC=Michigan and District=TriState. All Kmart accountsbelonging to NewYork(DC) and any District are moved to Michigan(DC) andTriState(District).

If the Account realignment specification is Move Account=Kmart fromDC=New York to Account=Walmart and District=TriState. All Kmart accountsbelonging to NewYork(DC) and any District are moved to WalMart(Account)within New York(DC) and TriState(District).

Moving Data Measure Intersection

Description

The dimension intersections within Data Measures can be realigned. Thesource and target specifications can contain different subsets ofdimensions. A subset of data measures dependent on the superset of theunion of the source and target dimension sets can be realigned. Thedimension specifications are the same as in case of Dimension MemberRealignment.

If a dependent dimension of an Aggregatable data measure is notspecified, all its members at the anchor level are considered as part ofthe source specification. If a dependent dimension of a Non-aggregatabledata measure is not specified, all its members are considered as part ofthe source specification.

In case of Aggregatable Data Measures, the subtrees under all theintersections are moved along with the intersections. Otherwise, onlyintersections specified are moved. Dimension members may be added, butare never deleted.

Example

UnitSales is an Aggregatable Data Measure dependent on Geography,Product and Time and Price is a Non-aggregatable data measure dependenton Product, Customer and Time.

If an intersection of Product and Time is to be moved and the targetspecified contains the member of Product dimension alone, both UnitSalesand Price or either one of them can be realigned. In case of UnitSales,the subtree under the Product part of the intersection is realigned forthe given Time for all Geography positions. In case of Price, Productpart of the intersection (not the subtree) is realigned for the givenTime for all Customer positions.

If an intersection of Geography and Product is to be moved and thetarget contains intersections of Geography and Time dimensions, onlyUnitSales can be realigned. The subtree under the Product part of theintersection is realigned for the given Geography, and all Time membersof the level specified in the target are realigned into one specifiedTime member and the subtrees under the all members are merged under one.

Deleting Intersections from Data Measures

Description

The intersection specification can contain a subset of dimensions. Theintersections can be deleted from a subset of data measures whodimensionality includes all dimensions in the specification. Thedimension specifications are the same as in case of Dimension MemberRealignment. All intersections within the selected data measuresmatching the specification are deleted.

If a dependent dimension of an Aggregatable data measure is notspecified, all its members at the anchor level are considered as part ofthe specification. If a dependent dimension of a Non-aggregatable datameasure is not specified, all its members are considered as part of thespecification.

In case of Aggregatable Data Measures, the subtrees under all theintersections are removed along with the intersections. Otherwise, onlyintersections specified are removed.

Security

Description

An object storage model is to be defined for OLAP. Access permissionscan be set for each object (Dimensions, Levels, Members, Data Measures,Data at cell level) by user, group etc. The access privileges can beRead, Modify, Add, Delete, etc.

Distributed OLAP

Description

The distribution of data is done subject to the following guidelines:

-   -   Separate the dimension view exposed to the applications from the        data storage    -   Store data such that all updates, locking can be performed        without having to access data from other sub-cubes    -   Provide mechanisms to minimize cross sub-cube data access while        computing relations and virtual data measures

Multi-dimensional data can be distributed into sub-cubes along adimension. Each sub-cube can be further distributed along the same or adifferent dimension.

The following are the two ways of distributing sub-cubes along adimension:

-   1) Partitioning the levels of a dimension    -   The levels are partitioned into two sets with one of the levels        belonging to both sets. All paths from a level belonging to one        set to a level belonging to the second set need to pass through        the common level. In a set, if a level is a parent(child) of the        common level, there does not exist a level in the set which is a        child(parent) of the common level.-   2) Partitioning the instances of a dimension    -   The partitioning of the instances is done at a level which is a        parent of all levels in the sub-cube being partitioned. In other        words, it is the highest level of the sub-cube. All levels        belong to both sub-cubes.

A sub-cube can also be partitioned by data measures also. Relations andvirtual data measure definitions cannot span multiple sub-cubes when thepartitioning is done by data measures.

All data access is synchronous. If the process servicing a sub-cube isdown, the data cannot be accessed.

Propagation of updates may need to be asynchronous. Even if the processservicing the sub-cube is down, the data needs to be guaranteed to beupdated eventually.

The allocation paths are restricted such that the lowest level of eachsub-cube is in the allocation path, if the allocation path spansmultiple sub-cubes. This is required to limit the allocate and followingaggregation operation to a sub-cube.

Configuration

Description

All programmed limits: number of dimensions, number of data measures,number of levels, number of concurrent users etc. should beconfigurable. In other words, these limits should be modifiable withoutrecompiling the system.

In summary, the method of the present invention involves settingthreshold values for different levels within a multi-dimensionaldatabase having data organized into hierarchical levels. By determiningwhether to aggregate selected data measures for each intersection of thedatabase, following rules related to these threshold values, databaseaccess is rendered reasonably efficient while not requiring a worst casemaximum storage size. Because it is relatively easy to adjust thesethreshold levels, it is relatively easy for an application to be tunedto trade off between access time for the database and data storage spacerequirements.

In general, lowering threshold levels will cause more data to be storedin the database, raising storage requirements but decreasing accesstime. In a complementary manner, raising threshold levels will tend toincrease access times because a larger number of data calculations willbe made at runtime. However, in return less storage is required for thedatabase. By providing adjustable threshold levels, the performance ofany given database can be easily optimized to suit any desiredapplication. Adjustment of threshold levels will require that newdeterminations be made as to which intersections store precalculateddata, but no major restructuring or recompilation is needed.

In addition to the above trade off, storing less data by raisingthreshold levels will cause updates to execute faster. This results fromthe decreased number of calculations to be made at update time.

While the invention has been particularly shown and described withreference to a preferred embodiment, it will be understood by thoseskilled in the art that various changes in form and detail may be madetherein without departing from the spirit and scope of the invention.

1. In a multidimensional database in which data is provided inhierarchical levels, a method comprising the steps of: setting athreshold value for a selected level; for each member of the selectedlevel that has more descendants than the level threshold value,aggregating a data measure for all descendants of the member, andstoring the aggregated value with the member.
 2. The method of claim 1,wherein for members of the selected level having less descendants thanthe threshold value, not aggregating or storing the data measure withthe member.
 3. The method of claim 1, wherein more than one level isassigned a threshold.
 4. The method of claim 3, wherein at least twodistinct levels have thresholds set at different values.
 5. The methodof claim 1, wherein the step of aggregating the data measure comprisesthe step of precalculating the data measure for all descendants of themember, and the storing step comprises storing the precalculated datameasure with the member.
 6. The method of claim 5, wherein theprecalculating step comprises summing the data measure for alldescendants of the member.
 7. The method of claim 1, wherein the step ofaggregating the data measure comprises summing the data measure for alldescendants of the member.
 8. The method of claim 1 further comprisingthe step of, for each member of the database: defining a count for thatmember to be 1 if the member has no children; defining a count for thatmember to be equal to the sum of counts for all children of the member;and resetting the count for a member to 1 if its count exceeds thethreshold, and contemporaneously aggregating the data measure for alldescendants of the member.
 9. A database system, comprising: amultidimensional database having a plurality of database intersections,the intersections having data organized as hierarchical levels havingmembers; for each member, a count value indicating a number ofdescendant members required to compute the member, a flag indicatingwhether the member is stored or not stored, and a value indicating achild level of the member to be accessed; for each level, a thresholdvalue; and a procedure for, when count values for a member's childrensum to a value greater than the threshold value for the member, storinga precalculated aggregated data measure for the descendants of themember into the member.
 10. The system of claim 9, wherein each levelhas a threshold value that is independent of the threshold values forthe remaining levels.
 11. Software for aggregating multidimensional dataprovided in hierarchical levels, the software embodied in acomputer-readable medium and, when executed, operable to: receive athreshold value for a selected level; for each member of the selectedlevel that has more descendants than the level threshold value,aggregating a data measure for all descendants of the member, andstoring the aggregated value with the member.
 12. The software of claim11, operable to not aggregate or store the data measure with the memberfor members of the selected level having less descendants than thethreshold value.
 13. The software of claim 11, wherein a threshold valueis received for more than one level.
 14. The software of claim 13,wherein at least two distinct levels have thresholds set at differentvalues.
 15. The software of claim 11, wherein: aggregating the datameasure comprises precalculating the data measure for all descendants ofthe member; and storing the aggregated value comprises storing theprecalculated data measure with the member.
 16. The software of claim15, wherein precalculating comprises summing the data measure for alldescendants of the member.
 17. The software of claim 11, whereinaggregating the data measure comprises summing the data measure for alldescendants of the member.
 18. The software of claim 11 further operableto, for each member of the database: define a count for that member tobe 1 if the member has no children; define a count for that member to beequal to the sum of counts for all children of the member; and reset thecount for a member to 1 if its count exceeds the threshold, andcontemporaneously aggregating the data measure for all descendants ofthe member.